Why are margins not estimable after dropped month FE? Plus a more conceptual question.

Leah Bevis

Join Date: Oct 2015
Posts: 125

Why are margins not estimable after dropped month FE? Plus a more conceptual question.

20 Aug 2024, 10:41

I am examining the effect of adding month fixed effects when estimating yearly wasting prevalence (ie incidence of children being too thin) in datasets where survey timing is not comparable across years. This is commonly done, but I realized that in some settings it helps, and in some settings it doesn't help / even worsens bias! Odd. In any case, this is a rabbit hole but I want to understand what's happening properly, so I decided to look at a bunch of simulated examples.

In my simulation, I create a dataset of 500 obs per year and month for all months from Jan 2013 - Dec 2019, and create variation in average wasting rates by month (following the pattern we see in Senegal, not important but just for the curious), with some year-to-year variation in those monthly patterns as is realistic. I create no "true" variation in year-to-year avg wasting rates. Then in part two, I show (within a preserve - restore) what happens if you keep only a randomly chosen 3-month sample period per year (visualized in a graphic for quick intuition).

My goal is to compare "true" wasting rates over the years (reg wasting i.year in full simulated sample) to "apparent" wasting rates over the years (reg wasting i.year in the non-comparable, 3-month survey sample), to "adjusted" wasting rates over the years (reg wasting i.year i.month in the non-comparable, 3-month survey sample).

Problem: In some simulated survey examples, certain months are dropped from the i.year i.month model -- I presume because they are multicollinear. This seems fine; the first sampled month (usually January) is always omitted as the "base" month to which others are being compared. So conceptually, these omitted months are joining first sampled month / January as the "base" month for which wasting is predicted via _cons + year-specific parameter. I assume, for instance, this is what predict does. But margins are "not estimable" when a month is FE is omitted fr/ the model. Why? Adding force does not help. Am I missing something conceptual?

Two resulting questions, one coding and one conceptual:

(1) With margins out of order, I can uncover year-specific predictions via predict, as shown in the simulation. But what I'd really like is to save the predictions in sim_adj. Is there some way to fix the margins problem / otherwise save these predictions in sim_adj for subsequent graphing?

(2) Conceptually, is there some way to know which 3-month sampling arrangement will lead to multicollinearity and dropped month FEs? If you re-run the preserve-restore code a dozen or so times, you'll notice that perhaps 1 in 4 or 1 in 5 sampling arrangements does NOT require a dropped month FE. But looking at the scatterplot by eye, I can't tell which arrangements do / don't require a dropped month FE (and thus result in un-estimable margins). When months are dropped, they are always in one year only. But not all months in one year only are dropped. And in the arrangements with no dropped month FE, there are still months in one year only. In sum, I can't work out the sampling pattern that leads to multicolliner months. Thoughts?

Code:

********************************************************************************
*** Create full simulated wasting dataset (500 obs per month-year)
********************************************************************************
clear all
set seed 1234 

set obs 7
gen year= _n+2012
expand 12 
bysort year: gen month=_n 
expand 500 
sort year month 

gen wasted = 0 

local i=1
forval y = 2013/2019 {
        gen x=runiform(0,1)
        gen e=runiform(-.005,.005)
        replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
        replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
        replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
        replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
        replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
        replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
        replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
        replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
        replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
        replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
        replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
        
        drop x e
}

* Similar seasonal wasting patterns in each year 
two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
    (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
    (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
    (lpoly wasted month if year==2019) 
    
* But no change in average wasting over the years
two lpolyci wasted year     

********************************************************************************
* "True" wasting rates by year in full sample 
********************************************************************************
reg wasted i.year
    margins year, saving(sim_original, replace)
    

********************************************************************************
** Randomly chosen survey periods by year
********************************************************************************
preserve
        bysort year: gen m1=round(runiform(1,12)) if _n==1
        bysort year: egen M1=max(m1)
        drop m1 
        gen M2 = M1+1
            replace M2=1 if M2==13
        gen M3 = M2+1
            replace M3=1 if M3==13
            replace M3=2 if M3==14
            
        gen SAMP=month==M1 | month==M2 | month==M3
        keep if SAMP==1

        * Visual of the sampled months per year 
        two scatter year month , ylabel(2013(1)2019) xlabel(1(1)12) msize(large) ///
            name(adj_setup`i', replace)

        * How well do the months explain the years
        reg year i.month
            local R = e(r2)
            
        * "Apparent" wasting rates under 3-month uneven sampling
        reg wasted i.year
            margins year, saving(sim_year, replace)

        * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
        reg wasted i.year i.month
            margins year, saving(sim_adj, replace) force
        
        * Notice that the "adjusted" prediction works, so why not margins??
        * I assume prediction is using _cons for pooled set of omitted months
        predict what 
        bysort year: sum what 
restore

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10187

20 Aug 2024, 15:12

You appear to want to estimate margins at unique values of year. Therefore, just use the -over()- option.

Code:

clear all
set seed 1234 

set obs 7
gen year= _n+2012
expand 12 
bysort year: gen month=_n 
expand 500 
sort year month 

gen wasted = 0 

local i=1
forval y = 2013/2019 {
        gen x=runiform(0,1)
        gen e=runiform(-.005,.005)
        replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
        replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
        replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
        replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
        replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
        replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
        replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
        replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
        replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
        replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
        replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
        
        drop x e
}

* Similar seasonal wasting patterns in each year 
two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
    (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
    (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
    (lpoly wasted month if year==2019) 
    
* But no change in average wasting over the years
two lpolyci wasted year     

********************************************************************************
* "True" wasting rates by year in full sample 
********************************************************************************
reg wasted i.year
    margins year, saving(sim_original, replace)
    margins, over(year)

********************************************************************************
** Randomly chosen survey periods by year
********************************************************************************
preserve
        bysort year: gen m1=round(runiform(1,12)) if _n==1
        bysort year: egen M1=max(m1)
        drop m1 
        gen M2 = M1+1
            replace M2=1 if M2==13
        gen M3 = M2+1
            replace M3=1 if M3==13
            replace M3=2 if M3==14
            
        gen SAMP=month==M1 | month==M2 | month==M3
        keep if SAMP==1

        * Visual of the sampled months per year 
        two scatter year month , ylabel(2013(1)2019) xlabel(1(1)12) msize(large) ///
            name(adj_setup`i', replace)

        * How well do the months explain the years
        reg year i.month
            local R = e(r2)
            
        * "Apparent" wasting rates under 3-month uneven sampling
        reg wasted i.year
            margins year, saving(sim_year, replace)
            margins, over(year)
        * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
        reg wasted i.year i.month
            margins year, saving(sim_adj, replace) force
            margins, over(year)
        * Notice that the "adjusted" prediction works, so why not margins??
        * I assume prediction is using _cons for pooled set of omitted months
        predict what 
        bysort year: sum what 
restore

Res.:

Code:

.     margins year, saving(sim_original, replace)

Adjusted predictions                                    Number of obs = 42,000
Model VCE: OLS

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |   3.966667   .2450461    16.19   0.000     3.486371    4.446962
       2014  |   3.266667   .2450461    13.33   0.000     2.786371    3.746962
       2015  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
       2016  |          4   .2450461    16.32   0.000     3.519705    4.480295
       2017  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
       2018  |        3.8   .2450461    15.51   0.000     3.319705    4.280295
       2019  |   3.633333   .2450461    14.83   0.000     3.153038    4.113629
------------------------------------------------------------------------------

.     margins, over(year)

Predictive margins                                      Number of obs = 42,000
Model VCE: OLS

Expression: Linear prediction, predict()
Over:       year

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |   3.966667   .2450461    16.19   0.000     3.486371    4.446962
       2014  |   3.266667   .2450461    13.33   0.000     2.786371    3.746962
       2015  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
       2016  |          4   .2450461    16.32   0.000     3.519705    4.480295
       2017  |   3.766667   .2450461    15.37   0.000     3.286371    4.246962
       2018  |        3.8   .2450461    15.51   0.000     3.319705    4.280295
       2019  |   3.633333   .2450461    14.83   0.000     3.153038    4.113629
------------------------------------------------------------------------------


.             margins year, saving(sim_adj, replace) force

Predictive margins                                      Number of obs = 10,500
Model VCE: OLS

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |          .  (not estimable)
       2014  |          .  (not estimable)
       2015  |          .  (not estimable)
       2016  |          .  (not estimable)
       2017  |          .  (not estimable)
       2018  |          .  (not estimable)
       2019  |          .  (not estimable)
------------------------------------------------------------------------------

.             margins, over(year)

Predictive margins                                      Number of obs = 10,500
Model VCE: OLS

Expression: Linear prediction, predict()
Over:       year

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |   4.733333   .5153973     9.18   0.000     3.723057     5.74361
       2014  |   3.066667   .5153973     5.95   0.000      2.05639    4.076943
       2015  |          5   .5153973     9.70   0.000     3.989723    6.010277
       2016  |        2.8   .5153973     5.43   0.000     1.789723    3.810277
       2017  |        5.2   .5153973    10.09   0.000     4.189723    6.210277
       2018  |        5.4   .5153973    10.48   0.000     4.389723    6.410277
       2019  |   3.066667   .5153973     5.95   0.000      2.05639    4.076943
------------------------------------------------------------------------------

Comment

Leah Bevis

Join Date: Oct 2015

Posts: 125
#3

21 Aug 2024, 12:36

Thanks Andrew -- however, it seems like running margins, over(year) produces identical results whether you run it after reg i.year or after reg i.month i.year. So then, that's not what I want -- I want the marginal effect of each year after having adjusted for the month FE. Or am I missing something?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

21 Aug 2024, 19:48

The predictive margins and the adjusted predictions would be the same if month and year are independent, as in your example. You can try reghdfe from https://github.com/sergiocorreia/reghdfe absorbing the month dummies. This will estimate the adjusted predictions that are estimable and only exclude the problematic cases.

Code:

clear all
set seed 1234 

set obs 7
gen year= _n+2012
expand 12 
bysort year: gen month=_n 
expand 500 
sort year month 

gen wasted = 0 

local i=1
forval y = 2013/2019 {
        gen x=runiform(0,1)
        gen e=runiform(-.005,.005)
        replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
        replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
        replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
        replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
        replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
        replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
        replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
        replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
        replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
        replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
        replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
        
        drop x e
}

* Similar seasonal wasting patterns in each year 
two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
    (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
    (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
    (lpoly wasted month if year==2019) 
    
* But no change in average wasting over the years
two lpolyci wasted year     

********************************************************************************
* "True" wasting rates by year in full sample 
********************************************************************************
reg wasted i.year
    margins year, saving(sim_original, replace)

********************************************************************************
** Randomly chosen survey periods by year
********************************************************************************
preserve
        bysort year: gen m1=round(runiform(1,12)) if _n==1
        bysort year: egen M1=max(m1)
        drop m1 
        gen M2 = M1+1
            replace M2=1 if M2==13
        gen M3 = M2+1
            replace M3=1 if M3==13
            replace M3=2 if M3==14
            
        gen SAMP=month==M1 | month==M2 | month==M3
        keep if SAMP==1

                    
        * "Apparent" wasting rates under 3-month uneven sampling
        reg wasted i.year
            margins year, saving(sim_year, replace)
         
        * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
        reg wasted i.year i.month
            margins year, saving(sim_adj, replace) 
        
        reghdfe wasted i.year, absorb(month)
        margins year, saving(sim_adj, replace)

        * Notice that the "adjusted" prediction works, so why not margins??
        * I assume prediction is using _cons for pooled set of omitted months
        predict what 
        bysort year: sum what 
restore

Res.:

Code:

 * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
.         reg wasted i.year i.month
note: 11.month omitted because of collinearity.

      Source |       SS           df       MS      Number of obs   =    10,500
-------------+----------------------------------   F(16, 10483)    =      4.63
       Model |  29488.6905        16  1843.04315   Prob > F        =    0.0000
    Residual |   4176967.5    10,483  398.451541   R-squared       =    0.0070
-------------+----------------------------------   Adj R-squared   =    0.0055
       Total |  4206456.19    10,499  400.653033   Root MSE        =    19.961

------------------------------------------------------------------------------
      wasted | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2014  |       .025    .835039     0.03   0.976    -1.611835    1.661835
       2015  |       -.05   .8602217    -0.06   0.954    -1.736198    1.636198
       2016  |      -3.55   2.173591    -1.63   0.102    -7.810652    .7106516
       2017  |        .25   1.769376     0.14   0.888    -3.218313    3.718313
       2018  |        .35   1.527674     0.23   0.819    -2.644532    3.344532
       2019  |       .875   1.019225     0.86   0.391    -1.122875    2.872875
             |
       month |
          2  |        4.8    1.26246     3.80   0.000     2.325337    7.274663
          3  |     -1.625   2.340664    -0.69   0.488    -6.213147    2.963147
          4  |       -1.8   2.160457    -0.83   0.405    -6.034907    2.434907
          5  |       -.65   2.066186    -0.31   0.753    -4.700117    3.400117
          6  |       .925   1.941922     0.48   0.634    -2.881536    4.731536
          7  |      3.275   1.941922     1.69   0.092    -.5315358    7.081536
          8  |         .3   1.546192     0.19   0.846     -2.73083     3.33083
          9  |       1.75   1.180924     1.48   0.138    -.5648349    4.064835
         10  |       2.45   1.180924     2.07   0.038     .1351651    4.764835
         11  |          0  (omitted)
         12  |        3.6    1.26246     2.85   0.004     1.125337    6.074663
             |
       _cons |       3.55   1.981816     1.79   0.073    -.3347361    7.434736
------------------------------------------------------------------------------

.             margins year, saving(sim_adj, replace) 

Predictive margins                                      Number of obs = 10,500
Model VCE: OLS

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |          .  (not estimable)
       2014  |          .  (not estimable)
       2015  |          .  (not estimable)
       2016  |          .  (not estimable)
       2017  |          .  (not estimable)
       2018  |          .  (not estimable)
       2019  |          .  (not estimable)
------------------------------------------------------------------------------

.         
.         reghdfe wasted i.year, absorb(month)
(MWFE estimator converged in 1 iterations)
note: 2016bn.year is probably collinear with the fixed effects (all partialled-out values are close 
> to zero; tol = 1.0e-09)

HDFE Linear regression                            Number of obs   =     10,500
Absorbing 1 HDFE group                            F(   5,  10483) =       0.24
                                                  Prob > F        =     0.9454
                                                  R-squared       =     0.0070
                                                  Adj R-squared   =     0.0055
                                                  Within R-sq.    =     0.0001
                                                  Root MSE        =    19.9613

------------------------------------------------------------------------------
      wasted | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2014  |       .025    .835039     0.03   0.976    -1.611835    1.661835
       2015  |       -.05   .8602217    -0.06   0.954    -1.736198    1.636198
       2016  |          0  (omitted)
       2017  |        .25   1.769376     0.14   0.888    -3.218313    3.718313
       2018  |        .35   1.527674     0.23   0.819    -2.644532    3.344532
       2019  |       .875   1.019225     0.86   0.391    -1.122875    2.872875
             |
       _cons |    3.97381   .6463843     6.15   0.000     2.706773    5.240846
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       month |        12           0          12     |
-----------------------------------------------------+

.         margins year, saving(sim_adj, replace)

Adjusted predictions                                    Number of obs = 10,500
Model VCE: OLS

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |          .  (not estimable)
       2014  |    3.99881   .7764071     5.15   0.000      2.47708     5.52054
       2015  |    3.92381   .6526441     6.01   0.000     2.644651    5.202968
       2016  |          .  (not estimable)
       2017  |    4.22381   1.295462     3.26   0.001     1.684752    6.762867
       2018  |    4.32381   1.053093     4.11   0.000     2.259786    6.387833
       2019  |    4.84881   .9463339     5.12   0.000     2.994029     6.70359
------------------------------------------------------------------------------

Comment

Leah Bevis

Join Date: Oct 2015

Posts: 125
#5

22 Aug 2024, 10:21

With a large number of years, then yes you are right -- the raw (i.year) margins and the adjusted (i.month i.year) margins would be the same since month and year are generated independently. But we are thinking about (and simulating) small samples, where the years and months are always at least slightly correlated by coincidence. That is the whole problem to begin with, when it comes to estimating yearly wasting prevalence in uneven seasonal samples. You can see this in my original example: if you run the preserve -> restore code a dozen or so times, 2-4 times you'll get margins that ARE estimable after the month FE specification, and in those cases, the margins are NOT the same if they follow i.year vs. i.month i.year. Do you see what I mean? Or am I missing something?

Ok, so you are right, reghdfe, absorb(month) does produce year margins every time, for at least a subset of years. Great.
When reg i.month produces estimable margins (like 1/3 or 1/4 the time), they are the same as those from reghdfe, absorb(month), given for all 7 years (2013-2019).

When reg i.month does NOT produce estimable margins (due to a month or year FE being omitted), then reghdfe, absorb(month) produces margins for only 5 of 7 years.

Can you help me to understand what's going on under the hood, for both commands, in this second case where month-year collinearity means that reg i.month does not produce estimable margins?
Why is margins year not working after reg y i.year i.month when a month or year FE has to be dropped? I know that I can change whether a month or year FE is dropped by the ordering. Yet when a FE is dropped it simply joins the "base" month/year cases (e.g. month 7 is omitted alongside the base month 1), and is represented by the constant. This is how post-reg predictions are made. (And those post-reg predictions are identical no matter the ordering of i.month i.year / no matter whether a month or year FE are dropped.) So why can't margins do the same math? If its simply a choice not to, thats fine. I just want to be sure I'm not missing something mathematically.

Can I think of reghdfe, absorb(month) as always keeping all month FE, and dropping some year FE when necessary? (I can't see what's happening under the hood with the month FE, I don't even know if they are truly being treated as FE.) I DO see that when month and years are collinear, 1 or more year FEs are dropped, just as in the reg y i.year i.month case. So why are margins still estimable after reghdfe, absorb(month), when not after the reg specification?

Why are margins given for only 5 of 7 years after reghdfe, in the case where a year is dropped? On pure intuition, I would have thought dropping a FE/dof from the model would result in 1 less year of margins, not two.

Basically, if I'm going to use this reghdfe specification, I want to be sure I understand (a) the original problem with multicollinearity and year-specific margins, and (b) how reghdfe gets around it.

Last edited by Leah Bevis; 22 Aug 2024, 11:03.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10187

22 Aug 2024, 13:28

Originally posted by Leah Bevis View Post

[*]Can I think of reghdfe, absorb(month) as always keeping all month FE, and dropping some year FE when necessary?

Correct. When you include indicators for both month and year, you cannot know what level of year is collinear with the month indicators unless you change the base. Absorbing the month indicators will mean that the level of year that is collinear is always dropped.

Why is margins year not working after reg y i.year i.month when a month or year FE has to be dropped? I know that I can change whether a month or year FE is dropped by the ordering. Yet when a FE is dropped it simply joins the "base" month/year cases (e.g. month 7 is omitted alongside the base month 1), and is represented by the constant. This is how post-reg predictions are made. (And those post-reg predictions are identical no matter the ordering of i.month i.year / no matter whether a month or year FE are dropped.)

Once a level of year is dropped, it does not exist in the regression. I think for this reason margins is not able to calculate an adjusted prediction.

So why can't margins do the same math? If its simply a choice not to, thats fine. I just want to be sure I'm not missing something mathematically.

I do not understand the logic behind the base-adjusted prediction being non-estimable. I can only speculate that since the omitted level "joins the base" as you state, then Stata has an issue calculating the adjusted prediction for the base year in isolation. margins can indeed do the math if you specify -noestimcheck-. Here, the estimable predictions will not change, and the predictions for the omitted levels and the base level will be the same. Since you describe whatever you are doing as an exploratory exercise, I think this may be fine. But you should use it only on these rare cases where there are collinearity problems.

Code:

clear all
set seed 1234

set obs 7
gen year= _n+2012
expand 12
bysort year: gen month=_n
expand 500
sort year month

gen wasted = 0

local i=1
forval y = 2013/2019 {
        gen x=runiform(0,1)
        gen e=runiform(-.005,.005)
        replace wasted = 100 if month==2 & x<=(.0354394+e) & year==`y'
        replace wasted = 100 if month==3 & x<=(.0322906+e) & year==`y'
        replace wasted = 100 if month==4 & x<=(.0291154+e) & year==`y'
        replace wasted = 100 if month==5 & x<=(.0344103+e) & year==`y'
        replace wasted = 100 if month==6 & x<=(.0417116+e) & year==`y'
        replace wasted = 100 if month==7 & x<=(.0561161+e) & year==`y'
        replace wasted = 100 if month==8 & x<=(.0418273+e) & year==`y'
        replace wasted = 100 if month==9 & x<=(.0463994+e) & year==`y'
        replace wasted = 100 if month==10 & x<=(.0596474+e) & year==`y'
        replace wasted = 100 if month==11 & x<=(.0385433+e) & year==`y'
        replace wasted = 100 if month==12 & x<=(.0344972+e) & year==`y'
        
        drop x e
}

* Similar seasonal wasting patterns in each year
two (lpoly wasted month if year==2013) (lpoly wasted month if year==2014) ///
    (lpoly wasted month if year==2015) (lpoly wasted month if year==2016) ///
    (lpoly wasted month if year==2017) (lpoly wasted month if year==2018) ///
    (lpoly wasted month if year==2019)
    
* But no change in average wasting over the years
two lpolyci wasted year    

********************************************************************************
* "True" wasting rates by year in full sample
********************************************************************************
reg wasted i.year
    margins year, saving(sim_original, replace)

********************************************************************************
** Randomly chosen survey periods by year
********************************************************************************
preserve
        bysort year: gen m1=round(runiform(1,12)) if _n==1
        bysort year: egen M1=max(m1)
        drop m1
        gen M2 = M1+1
            replace M2=1 if M2==13
        gen M3 = M2+1
            replace M3=1 if M3==13
            replace M3=2 if M3==14
            
        gen SAMP=month==M1 | month==M2 | month==M3
        keep if SAMP==1

                    
        * "Apparent" wasting rates under 3-month uneven sampling
        reg wasted i.year
            margins year, saving(sim_year, replace)
        
        * "Adjusted" wasting rates from adding month FE under 3-month uneven sampling
        reg wasted i.year i.month
            margins year, saving(sim_adj, replace)
        
        reghdfe wasted i.year, absorb(month)
        margins year, saving(sim_adj, replace)
        margins year, saving(sim_adj, replace) noestimcheck
       

        * Notice that the "adjusted" prediction works, so why not margins??
        * I assume prediction is using _cons for pooled set of omitted months
        predict what
        bysort year: sum what
restore

Res.:

Code:

.         margins year, saving(sim_adj, replace)

Adjusted predictions                                    Number of obs = 10,500
Model VCE: OLS

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |          .  (not estimable)
       2014  |    3.99881   .7764071     5.15   0.000      2.47708     5.52054
       2015  |    3.92381   .6526441     6.01   0.000     2.644651    5.202968
       2016  |          .  (not estimable)
       2017  |    4.22381   1.295462     3.26   0.001     1.684752    6.762867
       2018  |    4.32381   1.053093     4.11   0.000     2.259786    6.387833
       2019  |    4.84881   .9463339     5.12   0.000     2.994029     6.70359
------------------------------------------------------------------------------

.         margins year, saving(sim_adj, replace) noestimcheck

Adjusted predictions                                    Number of obs = 10,500
Model VCE: OLS

Expression: Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
       2013  |    3.97381   .6463843     6.15   0.000      2.70692    5.240699
       2014  |    3.99881   .7764071     5.15   0.000      2.47708     5.52054
       2015  |    3.92381   .6526441     6.01   0.000     2.644651    5.202968
       2016  |    3.97381   .6463843     6.15   0.000      2.70692    5.240699
       2017  |    4.22381   1.295462     3.26   0.001     1.684752    6.762867
       2018  |    4.32381   1.053093     4.11   0.000     2.259786    6.387833
       2019  |    4.84881   .9463339     5.12   0.000     2.994029     6.70359
------------------------------------------------------------------------------

NOTE: Do not use -noestimcheck- after regress.It will output nonsensical results. With reghdfe, the estimable levels are always estimated. So what are left out are the base prediction and predictions of the omitted levels. -noestimcheck- in this case will calculate the prediction as you have done manually and output this for the base and all omitted levels.

Last edited by Andrew Musau; 22 Aug 2024, 13:36.

Announcement

Why are margins not estimable after dropped month FE? Plus a more conceptual question.

Comment

Comment

Comment

Comment

Comment