Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    The way Stata chooses which category to omit is not based on the number of observations it represents. In most circumstances it will omit the categories with the smallest numerical value. When circumstances force it to remove more categories, it tends to start from the other end. For example, in a model where i.year is a variable, normally the first year is omitted. If there is a reason a second year has to be omitted it will typically be the last year. I don't know the details of how Stata makes these choices. The important things to remember are:

    1. It doesn't matter which ones get omitted. The meanings of the coefficients that remain change, but the model's predictions are not affected by the choice.
    2. Whenever you are working with a set of indicator ("dummy") variables, or for that matter, any set of colinear variables, the coefficients of those variables do not mean what they appear to mean in the regression output: they represent "effects" of the corresponding levels only relative to whatever has been omitted. So it is a dicey business looking at these coefficients in any case: interpreting them correctly requires care and some algebra.
    3. The outputs of -predict- and -margins-, which are the real results of these models anyway, will be the same (except perhaps for very minor rounding errors) no matter which of the colinear variables is dropped. These are the results you should be looking at in any case.

    You do have some degree of control over which categories get omitted. Look at the explanation of the ib*. operators in -help fvvarlist-. However Stata sometimes overrides those choices you make, particularly when there are interactions involved--so control is not complete. But, as already noted, it doesn't matter anyway.

    Comment


    • #17
      Thank you Clyde Schechter. I really appreciate your comments/advice. It is so very helpful. So, as a general rule, I should substitute the results from an interaction with those reported using -margins- - is that right? Does this hold if my regression only includes an interaction, e.g. "stcox i.rel##i.at1" or will it also hold if my regression includes other explanatory variables, e.g. "stcox i.rel##i.at1 educ empstat income"?

      Regarding these results (from Cox regression analysis):

      (1) I would expect the results of the interaction to have a negative sign as seen in both the main effects (using ##) and total effects (using #) in #12. Even when I run -margins- the sign is positive (see below). What do you suggest?

      (2) is there an option to convert the HRs to coefficients? Usually, when I combine the results from multiple models -esttab- automatically displays the coefficients, though I can specify HRs simply by adding "eform" in the code.

      (3) Is there a way to add the results from -margins- (and not the regression) to a table using esttab? I tried but it still reported the results from the Cox regression) or shall I add them manually?

      Code:
      . margins rel, at(at1=(0 1))
      
      Adjusted predictions                            Number of obs     =     10,272
      Model VCE    : OIM
      
      Expression   : Predicted hazard ratio, predict()
      
      1._at        : at1             =           0
      
      2._at        : at1             =           1
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           _at#rel |
              1 1  |          1          .        .       .            .           .
              1 2  |   .6208963    .179913     3.45   0.001     .2682733    .9735194
              1 3  |   .2335054   .1252064     1.86   0.062    -.0118947    .4789055
              1 4  |   .5487337    .221485     2.48   0.013     .1146311    .9828364
              2 1  |          .  (not estimable)
              2 2  |   .2285016   .2326992     0.98   0.326    -.2275805    .6845837
              2 3  |   .3947063   .1519148     2.60   0.009     .0969588    .6924539
              2 4  |   .1336125   .0514121     2.60   0.009     .0328466    .2343784
      ------------------------------------------------------------------------------

      Comment


      • #18
        1. The -margins- output you show is predicting the hazard ratio, not the coefficient. The coefficients have negative signs, but hazard ratios are always >= 0. Hazard ratio = exp(coefficient).

        2. Coefficient = log(hazard ratio), log being the natural logarithm.

        3. I don't know. I don't use esttab or the related programs myself, so I have only a very limited idea of what they can do. Perhaps somebody else will respond.

        Comment


        • #19
          Thank you very much Clyde Schechter. (1) Yes I understand that HRs >= 0, thank you for clarifying in (2) how to convert a HR to the coefficient (by taking the log(HR)) - this worked great [log(.1336125) = -0.87415291]. So a HR < 1 (but > 0) will provide a negative coefficient and a HR > 1 will give a positive coefficient - makes sense thanks.

          I'm looking for an alternative program to collate results from multiple models into a single table - may I ask which program you use?

          Comment


          • #20
            I generally don't use any. Occaionally, I use -esttab- or -estout- or -outreg- or -outreg2-. But I am senior enough in my position that I usually can pass that responsibility on to others, so I have not cultivated much expertise in the use of any of these programs and don't know enough about them to make recommendations or compare their strengths and weaknesses.

            Comment


            • #21
              That's fair enough. Thanks Clyde Schechter. As always, I appreciate your help.

              Comment


              • #22
                Hi Clyde Schechter. I just realised that you didn't answer my question in #17 (clarity on using margins results for interactions):
                So, as a general rule, I should substitute the results from an interaction with those reported using -margins- - is that right? Does this hold if my regression only includes an interaction, e.g. "stcox i.rel##i.at1" or will it also hold if my regression includes other explanatory variables, e.g. "stcox i.rel##i.at1 educ empstat income"?
                How do I proceed if the results from -stcox- are significant, but those from -margins- are not?
                Code:
                . stcox i.rel i.at1 i.educ i.esbrd1 i.esbrd2 c.lincY c.numchild i.marstat i.marstat#i.rel 
                > if inlist(marstat, 1, 3), allbaselevels
                
                (partial results below):
                
                Cox regression -- Breslow method for ties
                
                No. of subjects =        3,168                  Number of obs    =       6,216
                No. of failures =           52
                Time at risk    =         6216
                                                                LR chi2(16)      =       67.03
                Log likelihood  =   -332.84361                  Prob > chi2      =      0.0000
                
                -----------------------------------------------------------------------------------------
                                     _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
                ------------------------+----------------------------------------------------------------
                                marstat |
                          [1] de facto  |          1  (base)
                           [3] married  |   .6370129   .3292172    -0.87   0.383     .2313318    1.754128
                                        |
                            marstat#rel |
                        [1] de facto#1  |          1  (base)
                        [1] de facto#2  |          1  (base)
                        [1] de facto#3  |          1  (base)
                        [1] de facto#4  |          1  (base)
                         [3] married#1  |          1  (base)
                         [3] married#2  |   .6322774   .4359628    -0.66   0.506     .1636794    2.442425
                         [3] married#3  |   .1939949   .1519839    -2.09   0.036     .0417756     .900861
                         [3] married#4  |    .108176   .0997062    -2.41   0.016     .0177654    .6586981
                -----------------------------------------------------------------------------------------
                We see that both [3 3] and [3 4] are significant. This compare to the output using -margins- where the results are not significant:
                Code:
                . margins rel, at(marstat=(1 3))
                
                Predictive margins                              Number of obs     =      6,216
                Model VCE    : OIM
                
                Expression   : Predicted hazard ratio, predict()
                
                1._at        : marstat         =           1
                
                2._at        : marstat         =           3
                
                ------------------------------------------------------------------------------
                             |            Delta-method
                             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     _at#rel |
                        1 1  |   .9725483   1.911102     0.51   0.611    -2.773142    4.718238
                        1 2  |    1.02924   2.076059     0.50   0.620    -3.039761    5.098241
                        1 3  |   2.037673   4.195771     0.49   0.627    -6.185887    10.26123
                        1 4  |   1.609832   3.410202     0.47   0.637    -5.074041    8.293704
                        2 1  |   .6195258   1.228911     0.50   0.614    -1.789096    3.028148
                        2 2  |   .4145457   .8274396     0.50   0.616    -1.207206    2.036298
                        2 3  |   .2518101   .5092139     0.49   0.621    -.7462308    1.249851
                        2 4  |   .1109327    .227281     0.49   0.625    -.3345298    .5563953
                ------------------------------------------------------------------------------
                Also, when I've ran -margins- previously (as in #17), Stata provided a base level, which helps me interpret the results, but here Stata does not. Can you say why? How do I interpret the results without a base level?

                Comment


                • #23
                  In general, I think that the -margins- results are more understandable and more useful than what comes directly out of a regression command, certainly when there is any interaction term in the model, but often even in models with no interaction terms. With survival analyses, there are additional complications that make them a bit harder to understand

                  When your model contains only an interaction and no other covariates, there will be a base level shown in the -margins- output. You can recognize it because it's "Margin" is 1 and the standard error, z statistic, p-value, and confidence interval are all missing. When there are additional variables in the model, the base level is the situation where all of the variables are zero. Since all variables being zero is, in many realistic situations, an unusual, or even impossible, situation, it might be better to specify the covariates to be set at their means. This provides a more realistic base in most situations, and it is easy to then describe the hazard ratios shown by -margins- as ratios relative to "an average case." (That's a slight abuse of language, but is unlikely to be misunderstood.)

                  Comment


                  • #24
                    Hello all,

                    I hope this question is appropriate to put here. My thesis partner and I encounter some troubles with including interactions terms in our two-way fixed effects regression for panel data. Our panel data is based on years for the time variable and the panel variable is inventor. We are namely looking at the effect of the introduction of broadband internet on R&D collaborations and patent output. For the interaction effects we used the invt_network_size x mobile_invt interaction in the regressions of ln_npatent (log of number of patents per inventor per year), team size and out-of-region dependent variables. However, running the regressions with and without these interaction effects give very different results. For example, for the out-of-region dependent variables, broadband access is significant (***) when including interaction terms but not significant when not including them. We plan to include the terms but doubt whether its results are significant because of increased bias in the model rather than increasing accuracy. So the question is whether it is “safe” to include them?

                    The interaction term used is threefold with a dummy for small (>5), medium (>11) and large (>19) network size. The following regressions were run :
                    Team size (dummies)
                    xtreg tsize’i’ lag_post invt_career_age mobile_invt invt_network_size invt_pat_count nwsize_small#mobile_invt nwsize_medium#mobile_invt nwsize_large#mobile_invt i.year, fe vce(robust)
                    with i = 1, 2, 3, 5, 9.
                    Geographical distance (dummies)
                    xtreg diff_”i” lag_post invt_career_age mobile_invt invt_network_size invt_pat_count n_invt_county nwsize_small#mobile_invt nwsize_medium#mobile_invt nwsize_large#mobile_invt i.year, fe vce(robust)
                    with i = cbsa, state and country.

                    For example for state we have the following output :
                    note: 1.nwsize_small#1.mobile_invt omitted because of collinearity
                    note: 1.nwsize_medium#1.mobile_invt omitted because of collinearity
                    note: 1.nwsize_large#1.mobile_invt omitted because of collinearity
                    note: 2008.year omitted because of collinearity

                    Fixed-effects (within) regression Number of obs = 616,809
                    Group variable: inventor Number of groups = 122,546

                    R-sq: Obs per group:
                    within = 0.0298 min = 2
                    between = 0.0985 avg = 5.0
                    overall = 0.0716 max = 15

                    F(25,122545) = 361.65
                    corr(u_i, Xb) = -0.0407 Prob > F = 0.0000

                    (Std. Err. adjusted for 122,546 clusters in inventor)
                    -------------------------------------------------------------------------------------------
                    | Robust
                    diff_state | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    --------------------------+----------------------------------------------------------------
                    lag_post | .026687 .0023266 11.47 0.000 .022127 .031247
                    invt_career_age | -.0104363 .0004212 -24.78 0.000 -.0112619 -.0096107
                    mobile_invt | .2422791 .0078267 30.96 0.000 .2269389 .2576193
                    invt_network_size | .0075373 .0002032 37.09 0.000 .0071391 .0079356
                    invt_pat_count | -.0011883 .0001259 -9.44 0.000 -.001435 -.0009416
                    n_invt_county | -6.53e-06 6.17e-07 -10.58 0.000 -7.74e-06 -5.32e-06
                    |
                    nwsize_small#mobile_invt |
                    0 1 | -.0560181 .0067566 -8.29 0.000 -.0692609 -.0427753
                    1 0 | .1289964 .0030096 42.86 0.000 .1230976 .1348952
                    1 1 | 0 (omitted)
                    |
                    nwsize_medium#mobile_invt |
                    0 1 | -.0348569 .0058796 -5.93 0.000 -.0463807 -.0233331
                    1 0 | .0615859 .0034211 18.00 0.000 .0548806 .0682912
                    1 1 | 0 (omitted)
                    |
                    nwsize_large#mobile_invt |
                    0 1 | .0166035 .0063643 2.61 0.009 .0041295 .0290775
                    1 0 | .0152486 .0045987 3.32 0.001 .0062352 .0242619
                    1 1 | 0 (omitted)
                    |
                    year |
                    1995 | .0176018 .002886 6.10 0.000 .0119453 .0232584
                    1996 | .0091925 .0028473 3.23 0.001 .0036118 .0147733
                    1997 | .0240387 .0027311 8.80 0.000 .0186857 .0293917
                    1998 | .0233333 .0027733 8.41 0.000 .0178976 .0287689
                    1999 | .0353884 .0028746 12.31 0.000 .0297542 .0410225
                    2000 | .0369952 .0030628 12.08 0.000 .0309922 .0429981
                    2001 | .0361956 .0032339 11.19 0.000 .0298572 .042534
                    2002 | .0373603 .0034185 10.93 0.000 .0306602 .0440605
                    2003 | .0391099 .0035857 10.91 0.000 .0320821 .0461377
                    2004 | .0373599 .0038316 9.75 0.000 .02985 .0448697
                    2005 | .0367294 .0041126 8.93 0.000 .0286687 .04479
                    2006 | .0317682 .0044091 7.21 0.000 .0231265 .0404099
                    2007 | .0234309 .0048438 4.84 0.000 .0139371 .0329247
                    2008 | 0 (omitted)
                    |
                    _cons | .2611692 .003988 65.49 0.000 .2533528 .2689856
                    --------------------------+----------------------------------------------------------------
                    sigma_u | .32967806
                    sigma_e | .39436089
                    rho | .41137123 (fraction of variance due to u_i)
                    -------------------------------------------------------------------------------------------

                    We do not quite understand why 1 1 interaction effects are omitted? Is it because we include the variables as main affects as well?
                    Is it wrong to include them when without these interaction effects our model does not seem to be significant?

                    Comment

                    Working...
                    X