Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug in margins command when using "post"?

    I love the margins option but it can also be a source of frustration. For example, in perfectly well-behaved nonlinear models, such as logit or poisson with lots of dummy variables and interactions, I find myself having to use the "noestimcheck" option to get the average partial effects and standard errors. Probably there's a solution that I'm not aware of. I can live with that because it appears to produce the correct estimates when I've checked. My latest discovery about margins makes it very inefficient to do a simulation study using even logit or poisson.

    This has nothing to do with a particular data set. I've tried it on several generated data sets and with different nonlinear estimation commands. I'll describe the problem first and then show Stata commands/output. If I use margins after, say, logit, and want to evaluate the effect at different settings of the covariates, it seems to work fine as long as I don't use the "post" option. But in a simulation I've always had to use the "post" option to access the estimated APEs. When I use that option to get a second APE at another setting of the covariates, I get the message "option dydx() not allowed" -- when, of course, I know that option is allowed. If I reestimate the model then the problem goes away. But that means that if I want, say, six APEs then I need to estimate the model six times. Clearly this shouldn't be necessary and it's a big hurdle for simulations. If I don't use "post" the problem doesn't arise, but then I can't (or don't know how) to dynamically retrieve the APEs.

    This has become a hurdle in my recent work on nonlinear difference-in-differences estimation.

    Here's a simple example with T = 2 generated panel data. Without "post" everything is fine. If I reestimate the model then the problem goes way, but that can't be intended. There must be something I don't know about margins and post. Do I have to somehow drop the old coefficient stored in _b[1.w] before computing the next marginal effect? Appreciate any help!

    Code:
    . logit y i.w x i.w#c.x f02
    
    Iteration 0:   log likelihood = -686.40401  
    Iteration 1:   log likelihood = -660.20325  
    Iteration 2:   log likelihood = -660.18981  
    Iteration 3:   log likelihood = -660.18981  
    
    Logistic regression                                     Number of obs =  1,000
                                                            LR chi2(4)    =  52.43
                                                            Prob > chi2   = 0.0000
    Log likelihood = -660.18981                             Pseudo R2     = 0.0382
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             1.w |   -1.06079   .3509639    -3.02   0.003    -1.748667   -.3729136
               x |   .0157663   .1231613     0.13   0.898    -.2256253    .2571579
                 |
           w#c.x |
              1  |    .326336   .2400305     1.36   0.174    -.1441151    .7967871
                 |
             f02 |   1.031331    .151108     6.83   0.000     .7351643    1.327497
           _cons |  -.6603075   .1491261    -4.43   0.000    -.9525892   -.3680257
    ------------------------------------------------------------------------------
    
    . margins, dydx(w) at(f02 = 1)
    
    Average marginal effects                                 Number of obs = 1,000
    Model VCE: OIM
    
    Expression: Pr(y), predict()
    dy/dx wrt:  1.w
    At: f02 = 1
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             1.w |  -.1857112   .0502812    -3.69   0.000    -.2842605    -.087162
    ------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.
    
    . margins, dydx(w) at(f02 = 0)
    
    Average marginal effects                                 Number of obs = 1,000
    Model VCE: OIM
    
    Expression: Pr(y), predict()
    dy/dx wrt:  1.w
    At: f02 = 0
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             1.w |  -.1438547    .033023    -4.36   0.000    -.2085786   -.0791309
    ------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.
    
    . margins, dydx(w) at(f02 = 1) post
    
    Average marginal effects                                 Number of obs = 1,000
    Model VCE: OIM
    
    Expression: Pr(y), predict()
    dy/dx wrt:  1.w
    At: f02 = 1
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             1.w |  -.1857112   .0502812    -3.69   0.000    -.2842605    -.087162
    ------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.
    
    . di _b[1.w]
    -.18571124
    
    . margins, dydx(w) at(f02 = 0) post
    option dydx() not allowed
    r(198);
    
    . qui logit y i.w x i.w#c.x f02
    
    . margins, dydx(w) at(f02 = 0) post
    
    Average marginal effects                                 Number of obs = 1,000
    Model VCE: OIM
    
    Expression: Pr(y), predict()
    dy/dx wrt:  1.w
    At: f02 = 0
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             1.w |  -.1438547    .033023    -4.36   0.000    -.2085786   -.0791309
    ------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.

  • #2
    Jeff Wooldridge, once you use the -post- option, it overwrites the previous estimation results. I would recommend that you do the following:

    Code:
    logit y i.w x i.w#c.x f02
    estimates store res1
    margins, dydx(w) at(f02 = 0) post
    est restore res1
    margins, dydx(w) at(f02 = 0) post

    Comment


    • #3
      What is going on here is that when you use -margins, post-, the -post- option causes Stata to overwrite the original regression estimates with the -margins- estimates. The original regression estimates now having disappeared, -margins- can't be asked to calculate new estimates based on the original regression.

      I can think of two fairly simple workarounds that will spare you from having to rerun the regression:

      1. Before running -margins, post-, use -estimates store- to preserve the estimates, and then -estimates restore- after you are done with the -margins, post- results.

      2. -margins-, with or without the -post- options, leaves its results behind in matrix r(table). Depending on what you need for your simulations, the results you want may well be there. If so, you can run -margins- without post, store r(table) in a matrix, do your simulation-related calculations from that matrix. And then you are free to run a new -margins- command from the original regression results, which have been preserved.

      Added: Crossed with #2, which also recommends the first of these two solutions.

      Comment


      • #4
        Based on Clyde's solution, please find an example below.

        Code:
        . webuse lbw, clear
        (Hosmer & Lemeshow data)
        
        . logit low age
        
        Iteration 0:   log likelihood =   -117.336  
        Iteration 1:   log likelihood = -115.96259  
        Iteration 2:   log likelihood = -115.95598  
        Iteration 3:   log likelihood = -115.95598  
        
        Logistic regression                             Number of obs     =        189
                                                        LR chi2(1)        =       2.76
                                                        Prob > chi2       =     0.0966
        Log likelihood = -115.95598                     Pseudo R2         =     0.0118
        
        ------------------------------------------------------------------------------
                 low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |  -.0511529   .0315138    -1.62   0.105    -.1129188    .0106129
               _cons |   .3845819   .7321251     0.53   0.599    -1.050357    1.819521
        ------------------------------------------------------------------------------
        
        . 
        . margins, dydx(age) at(age = 20)
        
        Conditional marginal effects                    Number of obs     =        189
        Model VCE    : OIM
        
        Expression   : Pr(low), predict()
        dy/dx w.r.t. : age
        at           : age             =          20
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |  -.0115686   .0074507    -1.55   0.120    -.0261717    .0030344
        ------------------------------------------------------------------------------
        
        . disp r(table)[1,1]
        -.01156864
        
        . 
        . margins, dydx(age) at(age = 25)
        
        Conditional marginal effects                    Number of obs     =        189
        Model VCE    : OIM
        
        Expression   : Pr(low), predict()
        dy/dx w.r.t. : age
        at           : age             =          25
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |  -.0105374   .0062086    -1.70   0.090     -.022706    .0016311
        ------------------------------------------------------------------------------
        
        . disp r(table)[1,1]
        -.01053744

        Comment


        • #5
          Much appreciated everyone! Those are both easy solutions. Probably go with the store and restore option. As usual, not as hard as I envisioned ....

          Comment


          • #6
            Hi Jeff,

            Based on your example, it seems there is a third solution which might be even more straightforward. Rather than calling margins repeatedly, you could estimate the APEs at specified covariate values in one go by specifying the at() option repeatedly. That is, rather than
            Code:
            . margins, dydx(w) at(f02 = 1)
            . margins, dydx(w) at(f02 = 0)
            you could do:
            Code:
            . margins, dydx(w) at(f02 = 1) at(f02 = 0)
            and then grab the results from r(table) or use post and grab them from e().

            Best,
            Joerg

            Comment


            • #7
              Thanks Joerg! That might speed things up a bit, too.

              Comment


              • #8
                One additional idea is to use the undocumented generate(...) option. At each replication you can create a new variable (or temporary variable) whose average (suitably defined) is the APE.
                Code:
                margins, dydx(...) at(...) gen(var1)
                margins, dydx(...) at(...) gen(var2)
                See
                Code:
                help margins_generate

                Comment


                • #9
                  Thanks John! I might try that because for the nonlinear diff-in-diffs stuff I'm doing I need to average across different subpopulations, so using what Joerg suggested doesn't help in this case.

                  It seems that "store" and "restore" isn't as fast as I hoped, as the simulations seem to take almost as long as re-estimating the equation each time. Or I'm making a simple mistake. I really need to up my programming skills ....

                  Comment


                  • #10
                    I would not use the margins post option if you are trying to run a set of margins for various combinations of the RHS variables on the same model. I would also avoid estimates store/restore, as if you do not use the post option, the estimates will still be available. The best advice, given above by Joerg, is to use the r(table) returned by margins, from which you can grab the b, se, and if you wish the z and p-value. For example:

                    Code:
                    clear all
                    webuse nlsw88, clear
                    qui logit union wage collgrad south smsa married, nolog
                    foreach v of varlist collgrad south smsa married {
                        margins, dydx(wage) at(`v'=0)
                        mat b = (nullmat(b),r(table)[1,1])    
                        mat se = (nullmat(se),r(table)[2,1])
                        loc cn " `cn' `v'_0"
                        margins, dydx(wage) at(`v'=1)
                        mat b = b, r(table)[1,1]
                        mat se = se, r(table)[2,1]
                        loc cn "`cn' `v'_1"
                    }
                    mat bse = (b \ se)'
                    mat colnames bse = wage stderr
                    mat rownames bse = `cn'
                    matlist bse
                    which will produce

                    Code:
                    . matlist bse
                    
                                 |      wage     stderr 
                    -------------+----------------------
                      collgrad_0 |  .0087987   .0023539 
                      collgrad_1 |   .010111   .0025131 
                         south_0 |  .0103897   .0026827 
                         south_1 |  .0073798   .0020113 
                          smsa_0 |  .0085979   .0023554 
                          smsa_1 |  .0093163   .0024055 
                       married_0 |  .0097416   .0025402 
                       married_1 |  .0087618   .0023057

                    Comment

                    Working...
                    X