Bug in margins command when using "post"?

Jeff Wooldridge

Join Date: Apr 2014
Posts: 2170

Bug in margins command when using "post"?

08 Nov 2021, 10:37

I love the margins option but it can also be a source of frustration. For example, in perfectly well-behaved nonlinear models, such as logit or poisson with lots of dummy variables and interactions, I find myself having to use the "noestimcheck" option to get the average partial effects and standard errors. Probably there's a solution that I'm not aware of. I can live with that because it appears to produce the correct estimates when I've checked. My latest discovery about margins makes it very inefficient to do a simulation study using even logit or poisson.

This has nothing to do with a particular data set. I've tried it on several generated data sets and with different nonlinear estimation commands. I'll describe the problem first and then show Stata commands/output. If I use margins after, say, logit, and want to evaluate the effect at different settings of the covariates, it seems to work fine as long as I don't use the "post" option. But in a simulation I've always had to use the "post" option to access the estimated APEs. When I use that option to get a second APE at another setting of the covariates, I get the message "option dydx() not allowed" -- when, of course, I know that option is allowed. If I reestimate the model then the problem goes away. But that means that if I want, say, six APEs then I need to estimate the model six times. Clearly this shouldn't be necessary and it's a big hurdle for simulations. If I don't use "post" the problem doesn't arise, but then I can't (or don't know how) to dynamically retrieve the APEs.

This has become a hurdle in my recent work on nonlinear difference-in-differences estimation.

Here's a simple example with T = 2 generated panel data. Without "post" everything is fine. If I reestimate the model then the problem goes way, but that can't be intended. There must be something I don't know about margins and post. Do I have to somehow drop the old coefficient stored in _b[1.w] before computing the next marginal effect? Appreciate any help!

Code:

. logit y i.w x i.w#c.x f02

Iteration 0:   log likelihood = -686.40401  
Iteration 1:   log likelihood = -660.20325  
Iteration 2:   log likelihood = -660.18981  
Iteration 3:   log likelihood = -660.18981  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(4)    =  52.43
                                                        Prob > chi2   = 0.0000
Log likelihood = -660.18981                             Pseudo R2     = 0.0382

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.w |   -1.06079   .3509639    -3.02   0.003    -1.748667   -.3729136
           x |   .0157663   .1231613     0.13   0.898    -.2256253    .2571579
             |
       w#c.x |
          1  |    .326336   .2400305     1.36   0.174    -.1441151    .7967871
             |
         f02 |   1.031331    .151108     6.83   0.000     .7351643    1.327497
       _cons |  -.6603075   .1491261    -4.43   0.000    -.9525892   -.3680257
------------------------------------------------------------------------------

. margins, dydx(w) at(f02 = 1)

Average marginal effects                                 Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
dy/dx wrt:  1.w
At: f02 = 1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.w |  -.1857112   .0502812    -3.69   0.000    -.2842605    -.087162
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(w) at(f02 = 0)

Average marginal effects                                 Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
dy/dx wrt:  1.w
At: f02 = 0

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.w |  -.1438547    .033023    -4.36   0.000    -.2085786   -.0791309
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(w) at(f02 = 1) post

Average marginal effects                                 Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
dy/dx wrt:  1.w
At: f02 = 1

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.w |  -.1857112   .0502812    -3.69   0.000    -.2842605    -.087162
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. di _b[1.w]
-.18571124

. margins, dydx(w) at(f02 = 0) post
option dydx() not allowed
r(198);

. qui logit y i.w x i.w#c.x f02

. margins, dydx(w) at(f02 = 0) post

Average marginal effects                                 Number of obs = 1,000
Model VCE: OIM

Expression: Pr(y), predict()
dy/dx wrt:  1.w
At: f02 = 0

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         1.w |  -.1438547    .033023    -4.36   0.000    -.2085786   -.0791309
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

08 Nov 2021, 10:47

Jeff Wooldridge, once you use the -post- option, it overwrites the previous estimation results. I would recommend that you do the following:

Code:

logit y i.w x i.w#c.x f02 estimates store res1 margins, dydx(w) at(f02 = 0) post est restore res1 margins, dydx(w) at(f02 = 0) post
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#3

08 Nov 2021, 10:47

What is going on here is that when you use -margins, post-, the -post- option causes Stata to overwrite the original regression estimates with the -margins- estimates. The original regression estimates now having disappeared, -margins- can't be asked to calculate new estimates based on the original regression.

I can think of two fairly simple workarounds that will spare you from having to rerun the regression:

1. Before running -margins, post-, use -estimates store- to preserve the estimates, and then -estimates restore- after you are done with the -margins, post- results.

2. -margins-, with or without the -post- options, leaves its results behind in matrix r(table). Depending on what you need for your simulations, the results you want may well be there. If so, you can run -margins- without post, store r(table) in a matrix, do your simulation-related calculations from that matrix. And then you are free to run a new -margins- command from the original regression results, which have been preserved.

Added: Crossed with #2, which also recommends the first of these two solutions.
1 like
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

08 Nov 2021, 10:56

Based on Clyde's solution, please find an example below.

Code:

. webuse lbw, clear
(Hosmer & Lemeshow data)

. logit low age

Iteration 0:   log likelihood =   -117.336  
Iteration 1:   log likelihood = -115.96259  
Iteration 2:   log likelihood = -115.95598  
Iteration 3:   log likelihood = -115.95598  

Logistic regression                             Number of obs     =        189
                                                LR chi2(1)        =       2.76
                                                Prob > chi2       =     0.0966
Log likelihood = -115.95598                     Pseudo R2         =     0.0118

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0511529   .0315138    -1.62   0.105    -.1129188    .0106129
       _cons |   .3845819   .7321251     0.53   0.599    -1.050357    1.819521
------------------------------------------------------------------------------

. 
. margins, dydx(age) at(age = 20)

Conditional marginal effects                    Number of obs     =        189
Model VCE    : OIM

Expression   : Pr(low), predict()
dy/dx w.r.t. : age
at           : age             =          20

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0115686   .0074507    -1.55   0.120    -.0261717    .0030344
------------------------------------------------------------------------------

. disp r(table)[1,1]
-.01156864

. 
. margins, dydx(age) at(age = 25)

Conditional marginal effects                    Number of obs     =        189
Model VCE    : OIM

Expression   : Pr(low), predict()
dy/dx w.r.t. : age
at           : age             =          25

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0105374   .0062086    -1.70   0.090     -.022706    .0016311
------------------------------------------------------------------------------

. disp r(table)[1,1]
-.01053744

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2170
#5

08 Nov 2021, 11:29

Much appreciated everyone! Those are both easy solutions. Probably go with the store and restore option. As usual, not as hard as I envisioned ....
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 116
#6

08 Nov 2021, 11:58

Hi Jeff,

Based on your example, it seems there is a third solution which might be even more straightforward. Rather than calling margins repeatedly, you could estimate the APEs at specified covariate values in one go by specifying the at() option repeatedly. That is, rather than

Code:

. margins, dydx(w) at(f02 = 1) . margins, dydx(w) at(f02 = 0)

you could do:

Code:

. margins, dydx(w) at(f02 = 1) at(f02 = 0)

and then grab the results from r(table) or use post and grab them from e().

Best,
Joerg
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2170
#7

08 Nov 2021, 13:19

Thanks Joerg! That might speed things up a bit, too.
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#8

09 Nov 2021, 06:53

One additional idea is to use the undocumented generate(...) option. At each replication you can create a new variable (or temporary variable) whose average (suitably defined) is the APE.

Code:

margins, dydx(...) at(...) gen(var1) margins, dydx(...) at(...) gen(var2)

See

Code:

help margins_generate
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2170
#9

09 Nov 2021, 08:28

Thanks John! I might try that because for the nonlinear diff-in-diffs stuff I'm doing I need to average across different subpopulations, so using what Joerg suggested doesn't help in this case.

It seems that "store" and "restore" isn't as fast as I hoped, as the simulations seem to take almost as long as re-estimating the equation each time. Or I'm making a simple mistake. I really need to up my programming skills ....
1 like
Comment

KitBaum

Join Date: Mar 2014
Posts: 48

#10

15 Nov 2021, 19:33

I would not use the margins post option if you are trying to run a set of margins for various combinations of the RHS variables on the same model. I would also avoid estimates store/restore, as if you do not use the post option, the estimates will still be available. The best advice, given above by Joerg, is to use the r(table) returned by margins, from which you can grab the b, se, and if you wish the z and p-value. For example:

Code:

clear all
webuse nlsw88, clear
qui logit union wage collgrad south smsa married, nolog
foreach v of varlist collgrad south smsa married {
    margins, dydx(wage) at(`v'=0)
    mat b = (nullmat(b),r(table)[1,1])    
    mat se = (nullmat(se),r(table)[2,1])
    loc cn " `cn' `v'_0"
    margins, dydx(wage) at(`v'=1)
    mat b = b, r(table)[1,1]
    mat se = se, r(table)[2,1]
    loc cn "`cn' `v'_1"
}
mat bse = (b \ se)'
mat colnames bse = wage stderr
mat rownames bse = `cn'
matlist bse

which will produce

Code:

. matlist bse

             |      wage     stderr 
-------------+----------------------
  collgrad_0 |  .0087987   .0023539 
  collgrad_1 |   .010111   .0025131 
     south_0 |  .0103897   .0026827 
     south_1 |  .0073798   .0020113 
      smsa_0 |  .0085979   .0023554 
      smsa_1 |  .0093163   .0024055 
   married_0 |  .0097416   .0025402 
   married_1 |  .0087618   .0023057

Announcement

Bug in margins command when using "post"?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment