Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use STATA for comparing nested models with survey design

    Hi all,
    I came across with the problem when using the stata to compare two multinomial logistic regression models with survey design: one model is constrained (force the coefficient of a independent variable to be equal for outcome): my code is like this: * assign the sruvey design: two stage design
    svyset [w=weights12], psu(sdmvpsu) strata (sdmvstra)
    *Set constrains
    constraint 1 [1=2]:hyperchol
    *unconstrained model
    svy: mlogit db_cat hypertension hyperchol

    *preserve the unconstrainted model
    estimates store unconstrained

    *constrained model
    svy: mlogit db_cat hypertension hyperchol, constraints(1)

    *test the significance between constrained and unconstrained model.
    lrtest unconstrained
    -----------------------------------
    While without the survey design part, it works find, but with "svy" , I have this error: lrtest is not appropriate with survey estimation results. I searched online and someone suggest using "test" but the examples are all about setting coefficient of a variable/variables to 0, not setting them equal. So does anyone know how to test the difference of two model using with survey design?

    Thank you very much!!

    Thomas


  • #2
    Is my question clear enough? Please let me know. I think this is a very interesting question.
    Last edited by Xin Yang; 06 Oct 2014, 16:44.

    Comment


    • #3
      Hi Thomas. First off, check the FAQ for suggestions on how to pose your question more effectively. It is strongly preferred that members register with their real name (e.g. Thomas Yang). You can ask to change your username by clicking on the Contact Us button.

      Also, the FAQ discusses how to use CODE blocks. Code blocks are much easier to read than just pasting your commands without CODE blocks.

      Now, to get to your actual question: You want to use Wald tests, which you can do with the test and testparm commands. For example,

      Code:
      webuse nhanes2f, clear
      svy: logit diabetes i.female i.black
      testparm i.female i.black, equal
      test 1.female = 1.black
      test 1.female = 1.black, coef
      Basically, you estimate the unconstrained model, and then use test or testparm to test the constraints you are interest in. With test, you can (often) add the coef option to see what the constrained estimates look like.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        This handout summarizes a few of the key differences you have to be aware of when using the svy: prefix as opposed to when you don't use it. http://www3.nd.edu/~rwilliam/stats3/SvyCDACautions.pdf
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Thanks Richard, I did some research and I do not know if this code give the right test:
          Code:
          Code:
          webuse nhanes2f,clear
          svy: mlogit region i.female i.black
          test [NE]1.female=[MW]1.female, coef
          The result of Wald test is as following:

          HTML Code:
          Adjusted Wald test
          
           ( 1)  [NE]1.female - [MW]1.female = 0
          
                 F(  1,    31) =    0.66
                      Prob > F =    0.4233
          
          
          Constrained coefficients
          
          ------------------------------------------------------------------------------
                       |             Linearized
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          NE           |
              1.female |  -.0162441   .0625491    -0.26   0.795    -.1388381      .10635
               1.black |  -.1208884   .5369585    -0.23   0.822    -1.173308    .9315308
                 _cons |  -.2906148   .0683456    -4.25   0.000    -.4245698   -.1566598
          -------------+----------------------------------------------------------------
          MW           |
              1.female |  -.0162441   .0625491    -0.26   0.795    -.1388381      .10635
               1.black |     .66285   .5190526     1.28   0.202    -.3544743    1.680174
                 _cons |  -.1551524   .0669777    -2.32   0.021    -.2864263   -.0238785
          -------------+----------------------------------------------------------------
          S            |
              1.female |   .0351555   .0732598     0.48   0.631    -.1084312    .1787421
               1.black |   1.361956   .5449428     2.50   0.012     .2938878    2.430024
                 _cons |  -.2080088    .102116    -2.04   0.042    -.4081525   -.0078652
          -------------+----------------------------------------------------------------
          W            |
              1.female |          0  (omitted)
          So can I explain it as "Fail to reject the hypothesis that the coefficient of 1.female is same for NE and MW in multinomial logistic regression"?
          Thanks

          Comment


          • #6
            I am not sure why you would want to test such a hypothesis. But if that is what you want to test, you appear to be doing so correctly. A nice thing about the coef option is that it lets you see if what you wanted to test is indeed what you ended up testing.
            Last edited by Richard Williams; 07 Oct 2014, 14:45.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              My purpose is to test whether an independent variable has the same effect on different response variable levels in multinomial logistic regression. For example, whether variable "gender" contributes equally to outcome hypertension, and pre-hypertension ( normal is base).

              Comment


              • #8
                My purpose is to test whether an independent variable has the same effect on different response variable levels in multinomial logistic regression. For example, whether variable "gender" contributes equally to outcome hypertension, and pre-hypertension ( normal is base).
                Fair enough, but there are many ways of defining "effect," and what looks like the same "effect" in one definition may be different "effects" in others. What you have succeeded in doing is showing that gender similarly affects the two conditions in the metric of log risk ratios. But if the prevalence of the two conditions in the reference category differs, then you would find you have a different effect of gender if you went back and assessed the effect as risk difference. So, in describing your findings, you have to be very specific about what "same effect" means, because same effect in one scale of measurement will correspond to different effects in other scales of measurement not linearly related to the first.

                Let me also challenge another aspect of your approach. Hypertension and pre-hypertension are categorical variables derived by applying cutpoints to a continuous measurement: blood pressure. The use of these categories is of value for clinical decision making (and mandatory for billing fees), but statistically, it is simply throwing away information. Without knowing the full context of your research, I cannot make firm assertions, but there is a good chance that you would gain more insight into the epidemiology of blood pressure, if that is your goal, by using blood pressure itself as the dependent variable, and looking at the effect of gender on that.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post

                  Fair enough, but there are many ways of defining "effect," and what looks like the same "effect" in one definition may be different "effects" in others. What you have succeeded in doing is showing that gender similarly affects the two conditions in the metric of log risk ratios. But if the prevalence of the two conditions in the reference category differs, then you would find you have a different effect of gender if you went back and assessed the effect as risk difference. So, in describing your findings, you have to be very specific about what "same effect" means, because same effect in one scale of measurement will correspond to different effects in other scales of measurement not linearly related to the first.

                  Let me also challenge another aspect of your approach. Hypertension and pre-hypertension are categorical variables derived by applying cutpoints to a continuous measurement: blood pressure. The use of these categories is of value for clinical decision making (and mandatory for billing fees), but statistically, it is simply throwing away information. Without knowing the full context of your research, I cannot make firm assertions, but there is a good chance that you would gain more insight into the epidemiology of blood pressure, if that is your goal, by using blood pressure itself as the dependent variable, and looking at the effect of gender on that.
                  Very good points, thank you Clyde! This is the right place to have a discussion. I totally agree with you on blood pressure, it may not be best to do it this way. There may be some other situations where response is categorical and I may want to test whether coefficient of a independent variable is same for some levels of response. Also your point on "effect" is very insightful.

                  Comment


                  • #10
                    I wanted to resurrect this thread because I think I have a similar but slightly different question. If I have a model that includes only main effects and then run a second model that now includes one or more interaction terms, I often want to look at some overall model fit statistic such as the change in likelihood ratio or BIC, which I prefer to AIC. When the models are run on survey data, however, there is no more likelihood ratio reflecting overall model fit and consequently no BIC either. Even if the added interaction term is significant (ne 0), it may not be worth the loss of degrees of freedom and might make only a marginal contribution to the overall fit. I have often used fitstats to run compare models but this is unavailable after fitting survey data.

                    Is there any user written procedure or Stata post-estimation procedure that approximates the LR-based comparisons of model fit after running models using the svy prefix?

                    Thanks.

                    Comment


                    • #11
                      I only occasionally work with survey data, so the fact that I don't know of anything that would fit this description doesn't mean it doesn't exist. Perhaps somebody else will point something up.

                      But I'll use this as an occasion to argue that these likelihood-based (or other significance test-based) approaches to deciding which interactions (if any) to include are not a good idea. It is a commonplace that frequentist test statistics are sample-size sensitive. The same is true of likelihood-based statistics like AIC and BIC. Take the same data and just copy it over a few times (or even just once), re-run the analysis, and your LR improves (dragging the AIC and BIC with it). Now here's the snag: in most studies, by design, there is just enough data to power a reasonable test of a main effect. From which it inevitably follows that the power for interaction tests is way too low. Here's the best case scenario: your interacted variables are both dichotomous with 50% 1s and 50% 0s. Then the interaction term has 25% 1s and 75% zeroes. The "effective sample size" for a ttest (which, in effect, is what you are doing when you test an interaction term for significance even if in the particular model at hand it's a chi square or a z test), is 1/(1/n1 + 1/n2), and that is dominated by the smaller of n1 and n2. So your effective sample size for this kind of test is at best half that of a test for either main effect. And you don't have to move very far from that best case 50/50 on both main variables scenario to come out with pathetic effective sample sizes. And if any of the variables is polychotomous, the situation is even worse. If the sample was collected with a view to adequate power for one of these main effects, as is typically the case in my work, then a sample that is at best effectively half as large is simply out of its league!

                      There is even more reason not to rely on underpowered tests in the interaction-inclusion context. That is because in many contexts, although we are not practicing Bayesian statistics, the prior probability that there is no interaction is quite small. For example, in epidemiology, if one of the variables is an age group indicator, (I know, I'm always ranting that we should use age itself and not arbitrarily categorize it--but sometimes that's the way the data come to you), it is a bold and radical assertion to claim that any effect of any variable on any outcome is independent of age. Biology just doesn't work that way! Not everything is such a ubiquitous effect modifier as age, but many things come close. Using an underpowered statistic as a basis for failing to reject a null hypothesis that has very little plausibility is not a great idea.

                      For that reason, even when I'm dealing with simple random samples and have plenty of test statistics at my disposal, I rarely use them to decide on which interactions to keep. Rather, I look at model predictions with and without the interaction terms to see if the differences are large enough to matter from a practical perspective. This is inherently a judgment call, and when my own content-area background is insufficient to make that kind of call with confidence, I confer with people with more expertise in the content to help. Doing this actually helps both ways: in addition to avoiding an underpowered failure to reject a preposterous null hypothesis, if you happen to be endowed with a sample that is enormous, it will also prevent you from keeping some tiny, utterly unimportant interaction that nobody could possibly care about, but that happens to be "statistically significant" because you're working with a gigantic data set.

                      Comment


                      • #12
                        Thanks very much for that thoughtful response Clyde. (Only about 7, 108 more posts and I will have caught you).

                        I understand what you are saying and have to confess, I never did think about power and how it might affect these tests. That said, for my own work, I am usually working with very large samples such as aggregated years of the National Survey on Drug Use and Health. I have done analyses with well over 100,000 cases some times. I assume that would qualify as gigantic. The NSDUH, by the way, does provide you with only a categorical variable for age in the interest of protecting the anonymity of participants. I'm not even sure if they provide actual age in the restricted use data sets. I think they are being a bit over-protective with the data but it's not my call.

                        The potential underpowering does not even bother me that much as I am firmly on the side of preferring to make a Type II to a Type I error; in this one area, I am a conservative. That said, you make a great point about comparing the predictions with and without the interaction terms to see if the interactions make a practical difference. I suspect in many cases statistically significant interactions do not make a practical difference and it is possible I suppose, that non-significant ones make a difference although since power is hinged to effect size, a non-significant result would suggest a smaller effect unless the power is very low. I do plot the interactions post-estimation to see what they look like but that is cruder than actually comparing the prediction sets.

                        I will consider this further. One thing I am glad about so far is that I have not seen anywhere as yet, a simple solution to the issue of adjusting for survey design effects and model comparison statistics post-estimation.

                        Thanks very much again. I really enjoy the deliberation.

                        Comment


                        • #13
                          A user on the Research Gate discussion board, where someone else posted a question similar to mine, received this response:

                          Instead of using the svy prefix you can specify your models using the weight and robust options. This will yield exactly the same results. Doing so, you can use the estat ic command after running your model to obtain BIC and AIC values.

                          Example:
                          Code:
                          logit y a b c [pweight=weight], robust
                          estat ic
                          Is it true that running a model this way will yield "exactly the same results" as running the model using svyset with pweights and a PSU specified? That seems unlikely to me.

                          Comment


                          • #14
                            No, it is definitely not true. It would only be the case if there is no stratification and no primary sampling units.

                            Comment

                            Working...
                            X