Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins

    Using Stata 13.1 under Windows 7.1. Trying to understand margins better so I ran one of the examples from logit:

    Code:
    . webuse lbw
    (Hosmer & Lemeshow data)
    
    . logit low age lwt i.race smoke ptl ht ui
    
    Iteration 0:   log likelihood =   -117.336
    Iteration 1:   log likelihood = -101.28644
    Iteration 2:   log likelihood = -100.72617
    Iteration 3:   log likelihood =   -100.724
    Iteration 4:   log likelihood =   -100.724
    
    Logistic regression                               Number of obs   =        189
                                                      LR chi2(8)      =      33.22
                                                      Prob > chi2     =     0.0001
    Log likelihood =   -100.724                       Pseudo R2       =     0.1416
    
    ------------------------------------------------------------------------------
             low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |  -.0271003   .0364504    -0.74   0.457    -.0985418    .0443412
             lwt |  -.0151508   .0069259    -2.19   0.029    -.0287253   -.0015763
                 |
            race |
          black  |   1.262647   .5264101     2.40   0.016     .2309024    2.294392
          other  |   .8620792   .4391532     1.96   0.050     .0013548    1.722804
                 |
           smoke |   .9233448   .4008266     2.30   0.021      .137739    1.708951
             ptl |   .5418366    .346249     1.56   0.118     -.136799    1.220472
              ht |   1.832518   .6916292     2.65   0.008     .4769494    3.188086
              ui |   .7585135   .4593768     1.65   0.099    -.1418484    1.658875
           _cons |   .4612239    1.20459     0.38   0.702    -1.899729    2.822176
    ------------------------------------------------------------------------------

    As a test, this works ok:

    Code:
    . margins race ,atmeans
    
    Adjusted predictions                              Number of obs   =        189
    Model VCE    : OIM
    
    Expression   : Pr(low), predict()
    at           : age             =     23.2381 (mean)
                   lwt             =    129.8201 (mean)
                   1.race          =    .5079365 (mean)
                   2.race          =    .1375661 (mean)
                   3.race          =    .3544974 (mean)
                   smoke           =    .3915344 (mean)
                   ptl             =    .1957672 (mean)
                   ht              =    .0634921 (mean)
                   ui              =    .1481481 (mean)
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            race |
          white  |    .191685   .0454474     4.22   0.000     .1026096    .2807603
          black  |   .4560013    .107471     4.24   0.000      .245362    .6666405
          other  |   .3596187   .0695116     5.17   0.000     .2233784     .495859
    ------------------------------------------------------------------------------
    However, this throws an error:


    Code:
    . margins smoke ,atmeans
    factor 'smoke' not found in list of covariates
    r(322);

    Not sure why. The coefficient table shows smoke present:

    Code:
    r(table)[9,10]
                   low:        low:        low:        low:        low:        low:        low:        low:        low:        low:
                                            1b.          2.          3.                                                          
                   age         lwt        race        race        race       smoke         ptl          ht          ui       _cons
         b  -.02710031  -.01515082           0   1.2626473   .86207916   .92334482   .54183656   1.8325178   .75851348   .46122388
        se   .03645043   .00692588           .   .52641014   .43915315   .40082664     .346249   .69162923   .45937677   1.2045897
         z  -.74348404  -2.1875663           .   2.3985998    1.963049   2.3036014   1.5648755   2.6495667   1.6511794   .38288876
    pvalue   .45718868   .02870121           .   .01645789   .04964048   .02124503   .11761211   .00805951   .09870194   .70180224
        ll  -.09854183  -.02872529           .   .23090236   .00135479   .13773904    -.136799   .47694941  -.14184845  -1.8997286
        ul   .04434121  -.00157635           .   2.2943922   1.7228035   1.7089506   1.2204721   3.1880862   1.6588754   2.8221764
        df           .           .           .           .           .           .           .           .           .           .
      crit    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964
     eform           0           0           0           0           0           0           0           0           0           0

    What am I missing?

  • #2
    Only factor variables (as denoted by factor variable notation) go to the left of the comma. So race is ok, smoke is not. You could do something like

    margins, dydx(smoke)

    if you wanted. Here is an overview of margins:

    http://www3.nd.edu/~rwilliam/stats/Margins01.pdf
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    EMAIL: rwilliam@ND.Edu
    WWW: http://www3.nd.edu/~rwilliam

    Comment


    • #3
      Also, if you instead said i.smoke in the logit command, you would be ok with using margins. Even if a variable is already a dichotomy, you have to use factor variable notation in the estimation command so stata knows it is a categorical variable.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      EMAIL: rwilliam@ND.Edu
      WWW: http://www3.nd.edu/~rwilliam

      Comment


      • #4
        smoke is entered as continuous variable in your logit, so Stata errors. Typically, with continuous variables there are too many values to calculate the average predicted value at each distinct value, though that is not the case here since smoke is binary. Stata is being conservative, but I think the guard rail is a useful one since it forces you to be explicit about how your variables enter the equation. The solution here is to add the i. prefix.
        Last edited by Dimitriy V. Masterov; 07 May 2015, 12:44.

        Comment


        • #5
          Ok. I understand now. Thank you.

          Here's a follow-up question. I'm working on an age-period-cohort analysis. Using glm followed by margins works well (assuming svy set):

          Code:
          svy, subpop(domain): glm y a i.b i.age, family(binomial) link(logit) iterate(20)
          
          margins age, atmeans
          I realize that some may take issue with this methodology, prefering marginal means, but this seems to be that way the APC literature handles things.

          For the full APC analysis, I'm using a modification of the apc_ie (http://econpapers.repec.org/software...de/s456754.htm) module that allows full survey design information to be incorporated:

          Code:
          apc_ie4 y a b1 b2 b3, age(age) period(period) cohort(cohort) family(binomial) link(logit) iterate(20) svyopts("svy, subpop(domain))
          Although it is based on glm, it does not allow the use of factor variables. I don't know if this is because of the principal components on which it's based or some other reason, but I'm wondering if there is some way to properly group the age, period and cohort variables, and any covariates that will allow a computation of the marginal probabilities similar to that of age alone or age and period, etc. The other problem is that post estimation is not available after running this module. So, margins cannot be run anyway. Anyone have a solution?

          Comment


          • #6
            apc_ie was written long before factor variables were part of Stata. I suppose you could try to add support yourself. See

            http://www.stata.com/support/faqs/pr...iable-support/

            It does say it is a wrapper for Stata's glm command, so maybe it wouldn't be that hard. Or, just figure out what the wrapper is doing and maybe you can use glm directly.

            I would consider starting a new thread that included apc_ie in the title . If there is an apc_ie expert out there they may not be paying any attention to this thread.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            EMAIL: rwilliam@ND.Edu
            WWW: http://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thanks for the suggestions. Modifying the ado file is probably beyond my level of skill. The author already helped me add survey capability to the program, and it was quite a challenge, even though it was not that difficult. I think that adding factor variable capability is difficult from a conceptual and practical standpoint. And I don't think there are enough users of this module to garner much help elsewhere.

              I'm wondering if it would be easier to compute the probabilities directly using nlcom. The only question is how to obtain the means for the covariates, I'm not summarize will give me the correct numbers. Predict does not work after this module.

              Comment


              • #8
                Bill, you mentioned that the author helped you add survey capability to the program. Would you mind sharing this with me? I am also using this package with my survey data. Thank you in advance.

                Comment

                Working...
                X